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ABSTRACT 

This paper describes a study on select search engines to generate projected data on collection of 100 days of data 
series. The search engines select for the study are - Google, Bing, Yahoo, and Baidu to yield data series, using simple 
keyword “Citation” from the field of Library and Information Science. The forecasting of search engines was carried out 
by time series analysis collecting 100 days of sample and latter by trend projecting method, 50 days of forecasted data was 
generated which was taken into evaluation. On evaluation the results reveal that Yahoo! shows a positive secular trend 
while Google, Bing and Baidu show a downward or negative secular trend. 
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INTRODUCTION 

In the last two decades web has produced itself as an important source of information in the society. The major 
activity performed on Web is searching information for one’s research purposes (Madden, 2003; Fallows, 2004) which 
can be accessed using various search engines (Sullivan, 2005). However the results yielded for a number of queries rank in 
several thousand or even in millions due to the availability of infinite amount of information. However many studies show 
that only first few results are browsed by the users or few pages on an average only two pages with a default of 10 results 
per page, a total of 20 results (Silverstein, Henzinger, Marais & Moricz, 1999; Spink, Ozmutlu, Ozmutlu & Jansen, 
2002; Jansen & Spink, 2004; Jansen, Spink & Pedersen, 2005) which determines the success of a search engine 
therefore result ranking holds utmost importance in this regard. Result ranking was merely based on term frequency and 
the inverse document frequency in case of classical Information Retrieval system (Baeza- Yates & Ribeiro-Neto, 
1999). Various parameters are taken into account in Web search results ranking as number of links pointing to a given web 
page (Brin & Page, 1998; Google, 2016), the anchor text of the links pointing to the web page, the placement of the 
search terms in the document (terms occurring in title or header may get a higher weight), the distance between the search 
terms, popularity of the page (in terms of the number of times it is visited), the text appearing in metatags (Yahoo, 2016), 
subject specific authority of the web page (Kleinberg, 1999; Teoma, 2005), recently in search index and exactness of the 
hits (MSN, 2016). There is always an ongoing competition between search engines and Web page authors for users and 
high ranking respectively, which is why the algorithm ranking are kept a secret by the search engine companies as Google 
states (Google, 2016), " Due to the nature of our business and our interest in protecting the integrity of our search results, 
this is the only information we make available to the public about our ranking system". Apart from this search engines keep 
on updating and upgrading their algorithm so to improve their ranking of results. Nowadays search engine optimization 
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industries are present which design and redesign Web pages in order to enhance their rankings within a specific search 
engine (e.g., search engine optimization Inc., www.seoine.com/). Therefore in the crux it can be concluded that the First 
ten results retrieved for a query have major chances of being visited by the users. In addition to the examination of changes 
overtime for the top ten results related to a query of the largest search engine, which at the times of first data collection 
were Google, yahoo and Tacoma (MSN search came out if beta on Feb 1 st 2005 in the midst of data collection for the 
second round (Payne, 2005). However various transformations between the user's "visceral need" (a fuzzy view of the 
information problem in user's mind) and the "compromised need" (the way the query is phrased taking into account the 
limitations of the search tool at hand) (Taylor, 2009). Above all the fluctuation of a result related to a query can only be 
judged by the user while some researchers claim that it is impractical due to the presence of a large number of documents 
related to a query and all of them can't be viewed by the user, hence for checking fluctuation a panel of judges is required 
(Gordon & Pathak, 1999; TREC, 2014). 

Problem 

Internet in the beginning was simple as basic software’s were used to search information on web, software that 
was usually command driven rather than using a graphical interface. With the proliferation of information, systems such as 
Archie, Gopher and Veronica became increasingly unable to cope with huge information. The advent of many types of 
search engines provided solution for literature search using Boolean operators. Proximity searching. Wild cards, Truncation 
etc. Many search engines developed new versions and techniques to achieve some kind of sophistication but all have not 
helped to forward the case of access and searching from scholar’s perspective. Besides keeping in view different ways of 
indexing the internet, search engines operate in different ways and retrieve documents in different orders. Further, it does 
not sift information from scholar’s point of view i.e., it retrieves information on a particular topic from different aspects 
like marketing, advertisement, news and entertainment mixed with some research papers, 
attempts to look purely for scholarly information on his topic of interest to have output/ 
comprehensiveness and devoid of fluctuations etc. 

The present investigation attempts to evaluate the performance of the select search 
fluctuation captured in two phases to check the consistency of search engines. 

Objectives 

The following objectives are laid down for the study: 

• To select search engines. 

• To select search term for the study. 

• To collect data for 100 days. 

• To compare trending by forecasting of time series analysis. 

Method 

As certified by International Standard Organization there are 230 search engines (Promote3.com, 2016) available 
for searching the web. These search engines are of various types like general search engine, robotic search engine, Meta 
search engine, directories and specialized search engines. Most users prefer robotic search engines as they allow the users 
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to compose their own quires rather than simply follow pre specified search paths or hierarchy as in case of directories. 
Moreover, robotic search engines locate data in a similar way i.e., by the use of crawlers or worms. This distinguishing 
feature differentiates them form web directories like Yahoo! Where collections of links to retrieve URL’s are created and 
maintained by subject experts or by means of some automated indexing process. However some of these services are also 
include a robot driven search engine facility. But this is not their primary purposes. This due to this feature Yahoo! Was 
included for the study. 

Meta search engine e.g.. Dog pile etc don’t have their own database. These access the database of many robotic 
search engines simultaneously. Thus these were excluded for the study. 

Still hundreds of robotic general search engines navigate the web, in order to limit the scope of study after 
preliminary study, following criteria was laid down for selection of general search engines:- 

• Availability of automated indexing 

• Global coverage to data. 

• Quick response time. 

• Availability of result counter. 

Following two general search engines were selected for the study for meeting all the criteria and being 
comprehensive in nature. 

a) Google.bjBaidu. 

Since the study relates to the field of Library and Information Science but there is no specialized search engine in 
the subject so another specialized search engine which relates to the subject area i.e., Bing was taken for stydy. Thus the 
search engines undertaken for evaluation of study are:- 

• Google (General) 

• Bing (Specific) 

• Yahoo! (Directory) 

• Baidu(Country Specific General Search engine) 

Selection of Terms 

Selection of terms is not directly possible in development and multidimensional field like Library and Information 
Science. Therefore, classification schemes like DDC (18 th ) and DDC (22 nd ) were consulted to understand Broad/Narrow 
structure of Library and Information Science. It helped to get five terms/Fields i.e., 

• Information System. 

• Digital Library. 

• Library Automation. 

• Library Services. 
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• Librarianship. 

These terms were then browsed in “LC list of subject Headings” which provided many other related terms (RT) 
and Narrow terms (NT). Further NT and RT attached to each other preferred or standard terms were also browsed which 
retrieve a large number of Library and Information Science terms. At first instance 140 Library and Information Science 
related terms were identified. 

Some terms occurred more than once and duplication removed. It reduced the number to 100. Later terms were 
divided into three broad groups under: 

a) Application, b) Transformation, c) Inter-relation. 

“Application” denotes utility of Library and Information science in various fields and about 50 terms came under 
this group. “Transformation” refers to a method of developing or manufacturing library services into practical market and 
30 terms fall under this group. “Inter-relation” means transformation/dependence of one subject onto another and 20 terms 
came under this group. 

Further each category is sub-divided into groups. 

“Application” into four i.e., “Reference service”, “Informatics”, “Information Retrieval” & “Information Sources” 
“Transformation” into two i. e, “Digitization” & “Consortia” “Inter-relation” into two i.e., “Library Network” & 
“Information System” 

The terms in each group were arranged alphabetically and each term was given a tag. Later 19% of the terms were 
selected from each group using “Systematic Sampling” (i.e., first item selected randomly and next item after specific 
intervals). It further reduced the number to 19. Finally the selected terms were classified into three groups under “Simple”, 
“Compound” & “Complex Terms” (Table 1.0). This was done in order to investigate how search engines control and 
handle simple and phrased terms. 

“Simple Terms” containing a single word were submitted to the search engine in the natural form i.e., without 
punctuating marks. “Compound Terms” consisting of two words were submitted to the search engines in the form of 
phrases as suggested by respective search engines and “Complex Terms” composed of more than two words or phrases, 
were sent to the search engine with suitable Boolean operator “AND” & “OR” between the terms to perform special 
searches. From the simple terms the 2 nd term “ Citation ” was taken for the study. 


Table 1.0: Keywords 


S. No 

Simple Terms 

Compound Terms 

Complex Terms 

1 

Catchwork 

Bibliometric Classification 

Digital Library Open Source Software 

2 

Citation 

Citation Analysis 

Health Information System 

3 

Dublincore 

Comparative Librarianship 

Library Information System 

4 

Indexing 

Digital Preservation 

Library Information Network 

5 

Manuscript 

Electronic Repositories 

Multimedia Information Retrieval 

6 

Plagiarism 

Library Automation 


7 

Reprints 

Semantic web 



Fluctuation 

Information is growing on the web, as documents being added on routine basis that keep on changing as these 
documents are removed or modified. These quantitative and qualitative changes are expressed as fluctuations. The 
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quantitative changes are expressed as “Result Fluctuations” and the qualitative changes are expressed as “Document” and 
“Indexing Fluctuations”. A fluctuation may show decrease or increase in number of documents. However, growth in size of 
the database is a continuous and usual routine of the search engines. Thus increase and decrease is taken into account here. 

A “Result Fluctuation” appears when a search engine show increase/decrease in total number of results for a query 
that is searched at two different intervals of time. In other words the total number of results retrieved for a query in second 
observation may be less as retrieved in the first observation. Thus result fluctuation appears when there is increase/decrease 
in the number of results for a query tested over time i.e., the number of results in succeeding observation may be more or 
less than the results of the preceding observation. 

A forecast is an estimate of a future event achieved by systematically combining and casting forward in 
predetermined way from the data about the past. It is simply a statement about the future prediction. Forecasts are possible 
only when a history of data exists. The study collected 100 days of data samples from four search engine out of seven as 
result-counter was available with Google, Bing, Yahoo and Baidu. The data collection was carried on 15 th May, 2016 and 
ended on 18 th of August, 2016 collecting 100 samples for keyword “Citation” in four search engines Table 1.1. 

For forecasting process few points were taken into consideration as: 

• Fluctuation of search results and sustainability 

• 100 days of data sampling were taken into consideration (Table 1.1). 

• As the data is seasonal. Trend Projection Method was taken into consideration. 

• Total results were taken from result search counter of search engine. 

• A forecast of 50 days was generated (Table 1.2). 

• The results were evaluated on a scattered graph with regression line 
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Table 1.1: Time Series Data for Forecasting of Select Search Engines for the keyword “ Citation ” 
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The time-series forecasting method fits a trend line to a series of historical data points and then projects the line 
into the future for medium- to long range forecasts. The research has described the trend component with a line visually to 
a set of points on a graph. The graph, however, is subject to slightly different interpretations. There are three types of trend 
projection viz., 

• Positive Secular Trend or Upward Secular Trend:- it describes the data into a upward or raising trend line. 

• Negative Secular Trend or Downward Secular Trend:- it describes the data into lowering trend line 

• Neutral Secular Trend or Straight Secular Trend:- no changes the data is consistent. 

For the study 400 samples were taken into account to generate 200 results of projected data which are described in 

graphs. 


The formula derived for the study is:- 

t t =bo + bit 

b 0 and bi can be derived as: 
b 0 = y-b 1 t 

, nStyt - Stly t 
1 ~ nit 2 - (It ) 2 

Where 
t = days 

y t = Result of the search query 

The projected result Table 1.2 shows a vast fluctuation both in terms of positive Secular trend and negative 
secular trend. The estimate is given by a trending line. 


Table 1.2: Projected Data using Trend Projection Method for 50 Days for the Keyword “ Citation ” 


Days Google Bing Yahoo! Baidu 


1 

444110909 

14375697 

121842424 

29490727 

2 

442724208 

14360680 

121943900 

29411076 

3 

441331951 

14342817 

122051514 

29330674 

4 

438995050 

14321736 

122165608 

29247484 

5 

437461197 

14299346 

122263513 

29161381 

6 

435895800 

14273209 

122389412 

29072238 

7 

434273851 

14250166 

122498566 

28992039 

8 

432590996 

14225861 

122637515 

28910007 

9 

430893525 

14200225 

122784692 

28826090 

10 

429157853 

14173183 

122966624 

28740233 

11 

427382931 

14147325 

123134216 

28639048 

12 

425567680 

14120220 

123311677 

28534216 

13 

423710990 

14089013 

123499576 

28419993 

14 

421811719 

14053243 

123698515 

28301120 

15 

419839598 

14021143 

123909133 

28186094 

16 

417819084 

13990423 

124132110 

28066973 

17 

415688084 

13970496 

124337864 

27937508 

18 

413498585 

13950589 

124553927 

27802857 

19 

411216980 

13924423 

124780868 

27662762 
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20 

408901062 

13894385 

124987174 

27516951 

21 

406519893 

13873111 

125234612 

27365139 

22 

404037735 

13845206 

125461333 

27207020 

23 

401447614 

13816594 

125697536 

27028696 

24 

398949368 

13769981 

126012799 

26852199 

25 

396351680 

13727657 

126346510 

26678429 

26 

393540530 

13693960 

126628273 

26486910 

27 

390640023 

13666551 

126922572 

26264959 

28 

387646378 

13638955 

127230083 

26031417 

29 

384480459 

13626222 

127513951 

25793119 

30 

381204575 

13615073 

127884568 

25542688 

31 

377813649 

13601779 

128236052 

25279414 

32 

374341700 

13558422 

128603911 

25002541 

33 

370788167 

13533039 

128989069 

24703260 

34 

367111948 

13507814 

129392514 

24387791 

35 

363224919 

13499282 

129774082 

24063387 

36 

359110342 

13446907 

130171364 

23717518 

37 

354877520 

13391594 

130585198 

23352825 

38 

350480623 

13333156 

131016477 

22968143 

39 

345736646 

13271397 

131422516 

22601497 

40 

340835205 

13206106 

131842922 

22216817 

41 

335815732 

13141540 

132278390 

21754720 

42 

330587432 

13068948 

132775112 

21283781 

43 

325184907 

12992128 

133293693 

20786168 

44 

319555219 

12910804 

133835379 

20260157 

45 

313402091 

12829406 

134401502 

19722811 

46 

304951667 

12743441 

134993487 

19184171 

47 

297902254 

12647764 

135661343 

18616915 

48 

288540872 

12551244 

136363892 

18019215 

49 

280467757 

12449160 

137103338 

17374200 

50 

272025021 

12331081 

137932344 

16692922 
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Figure 1.4: Negative Secular Trend of Bing for the Keyword 
‘‘‘‘Citation'’'’ 


Figure 1.3: Negative Secular Trend of Google for the 
Keyword “ Citation ” 



Days 


Figure 1.5: Straight Secular Trend of Yahoo! for the 
Keyword “ Citation ” 



Figure 1.6: Positive Secular Trend of Baidu for the 
Keyword “ Citation ” 
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CONCLUSIONS 

The trending of the search engines reveal that Google shows negative secular trend while Bing also shows 
negative secular trend. Yahoo! Shows an upward or positive secular trend, Baidu on the other hand shows a negative 
secular trend. The data forecasted show a consistent growth in the database of Yahoo! in terms of result fluctuation. Google 
and Baidu drops down showing down secular trending resulting in loss in database. 
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